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ENCODING OF VIDEO CROSS-FADES USING WEIGHTED PREDICTION 

CROSS-REFERENCE TO RELATED APPLICATION 

This application claims the benefit of U.S. Provisional Application Serial No. 
5 60/430,793 (Attorney Docl<et No. PU020487), filed December 4, 2002 and entitled 
"ENCODING OF VIDEO CROSS-FADES USING WEIGHTED PREDICTION", which 
is incorporated herein by reference in its entirety. 

FIELD OF THE INVENTION 
10 -The present invention is directed towards video encoders, and In particular, 

towards an apparatus and method for effectively producing video cross-fades 
between pictures. 

BACKGROUND OF THE INVENTION 
15 Video data is generally processed and transferred in the form of bit streams. 

Typical video compression coders and decoders ("CODECs") gain much of their 

compression efficiency by forming a reference picture prediction of a picture to be 

encoded, and encoding the difference between the current picture and the prediction. 

The more closely that the prediction is correlated with the current picture, the fewer 
20 bits that are needed to compress that picture, thereby increasing the efficiency of the 

process. Thus, it is desirable for the best possible reference picture prediction to be 

formed. 

In many video compression standards, including Moving Picture Experts 
Group ("MPEG")-1 , MPEG-2 and MPEG-4, a motion compensated version of a 

25 previous reference picture Is used as-a prediction for the current picture, and only the 
difference between the current picture and the prediction is coded. When a single 
picture prediction ("P" picture) is used, the reference picture is not scaled when the 
motion compensated prediction is formed. When bi-directional picture predictions 
("B" pictures) are used, intermediate predictions are formed from tvyo different 

30 pictures, and then the two Intermediate predictions are averaged together, using 
equal weighting factors of {Vk, Yz) for each, to form a single averaged prediction. 

In some video sequences, in particular those with fades, the current picture to 
be coded Is more strongly correlated to the reference picture scaled by a weighting 
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,ao.cr*,an.o,he,...rencep,cure«se». T^e Join. Video Team rJVr)vid^ 
™ssion s,ande«i allows weighting fades and oHsets ,0 be sen. for eaoh 
ZrLrT,,es.andardspec«,esi,ow..edeooderwii,use,hewel^^^^ ■ 

H does no. speoKv how an encoder n,igh, defemnine an appropnafe 
^Sngfaofor. For se^uenoesfhaflnclude cross-fades, de.er™n,ngm^^ 

appropnL welghfing faCo. and reference pictures to use ,s <,u«e d«,cu«. 

^^-^^Stf^^^Ks and disadvantages of the p.or a. are add.^ed 
W an aZmtus and method that efliclentV compress video cross-f ades us,ng JVT 
:e:r:predictlo„. Theend-pomtsofacross-fadeared^rmlnedandusedas 
reference picture for encoding pictures in. he cross-fade reg on. 

An appara.us and memod are provided for encoding video signal daU for a 

,*ich is .o be read In conneClon wim me accompanyng drawrngs. 

a»n,pla.,lgu.^^^^^^^^ 

2 Shows a blocK diagram for a video enc^er wUh ln,p« reference . 

picture weighting for video cross-fades; 



BNSOOCID- <WO 200405422SA2.1-> 



. wo 2004/054225 PCT/US2003/036413 

3 

Figure 4 shows a block diagram for a video decoder with explicit reference 
picture weighting for video cross-fades; 

Figure 5 shows a pictorial representation of a video cross-fade between a pair 
of pictures; and 

5 Figure 6 shows a flowchart for an exemplary encoding process. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

An apparatus and method are disclosed for encoding of video cross-fades 
using weighted prediction, including motion vector estimation and adaptive reference 

10 picture weighting factor assignment. In some video sequences, in particular those 
with fading, the current picture or image block to be coded is more strongly correlated 
to a reference picture scaled by a weighting factor than to the reference picture itself. 
Video encoders without weighting factors applied to reference pictures encode fading 
sequences very inefficiently. When weighting factors are used in encoding, a video 

15 encoder needs to determine both weighting factors and motion vectors, but the best 
choice for each of these depends on the other. 

Hence, a method is described to efficiently compress video cross-fades using 
JVT weighted prediction. The end-points of a cross-fade are first determined and 
used as the reference pictures for encoding the pictures in the cross-fade region. 

20 The present description illustrates the principles of the invention. It will thus be 

appreciated that those skilled in the art will be able to devise various arrangements 
that, although not explicitly described or shown herein, embody the principles of the 
invention and are included within its spirit and scope. 

All examples and conditional language recited herein are intended for 

25 — pedagogical purposBs to"*aid the readerln'Offderstanding the principles of the 

invention and the concepts contributed by the inventor to furthering the art, and are to 
be construed as being without limitation to such specifically recited examples and 
conditions. 

Moreover, aji statements herein reciting principles, aspects, and embodiments. 
30 of the Invention, as well as specific examples thereof, are intended to encompass 
— both structural and functional equivalents thereof. Additionally, it is intended that 
such equivalents include both currently known equivalents as well as equivalents 
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developed in the future, i.e., any elements developed that perfomi the same function. 

regardless of structure. 

Thus, for example, it will be appreciated by those skilled In the art that the 
block diagrams herein represent conceptual views of illustrative circuitry embodying 
the principles of the invention. Similarly, it will be appreciated that any flow charts, 
flow diagrams, state transition diagrams, pseudocode, and the like represent various 
processes which may be substantially represented in computer readable media and 
so executed by a computer or processor, whether or not such computer or processor 
is explicitly shown. 

The functions of the various elements shown in the figures may be provided 
through the use of dedicated hardware as well as hardware capable of executing 
software in association with appropriate software. When provided by a processor, 
the functions may be provided by a single dedicated processor, by a single shared 
processor, or by a plurality of individual processors, some of which may be shared. 
Moreover,' explicit use of the temi "processor" or "controller" should not be construed 
to refer exclusively to hardware capable of executing software, and may implicitly 
include, without limitation, digital signal processor C'DSP") hardware, read-only 
memory ("ROM") for storing software, random access memory ("RAM"), and 

non-volatile storage. 

Other hardware, conventional and/or custom, may also be Included. Similarly, 
any switches shown in the figures are conceptual only. Their function may be carried 
out through the operation of program logic, through dedicated logic, through the 
interaction of program control and dedicated logic, or even manually, the particular 
technique being selectable by the implementer as more specifically understood from 

— ^the-contextT- - ' - 

In the claims hereof, any element expressed as a means for perfonning a 
specified function is intended to encompass any way of performing that function 
including, for example, a) a combination of circuit elements that performs that 
function or b) software in any forni. including, therefore, fimnware, microcode or the 
like, combined with appropriate circuitry for executing that software to perform the 
function.-The invention as defined by such claims resides in the fact that the 
functionalities provided by the various recited means are combined and brought 
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together in the manner which the claims call for. Applicant thus regards any means 
that can provide those functionalities as equivalent to those shown herein. 

In some video sequences, in particular those with fading, the current picture or 
image block to be coded is more strongly correlated to a reference picture scaled by 
5 a weighting factor than to the reference picture itself. Video encoders without 
weighting factors applied to reference pictures encode fading sequences very 
inefficiently. 

In the Joint Video Team ("JVT") video compression standard, each P picture 
can use multiple reference pictures to form a picture's prediction, but each individual 

10 macroblock or macroblock partition (of size 1 6x8, 8x1 6ror-8x8) uses only a single 
reference picture for prediction. In addition to coding and transmitting the motion 
vectors, a reference picture index is transmitted for each macroblock or macroblock 
partition, indicating which reference picture is used. A limited set of possible 
reference pictures is stored at both the encoder and decoder, and the number of 

15 allowable reference pictures is transmitted. Unlike in previous standards, such as 
MPEG-2, a JVT encoder has considerable flexibility in that previously coded pictures 
can be used as reference pictures. 

In the JVT standard for bi-predictive pictures (also called "B" pictures), two 
predictors are formed for each macroblock or macroblock partition, each of which can 

20 be from a separate reference picture, and the two predictors are averaged together to 
form a single averaged predictor. For bi-predictively coded motion blocks, the 
reference pictures can both be from the forward direction, both be from the backward 
direction, or one each from the forward and backward directions. 

Two lists are maintained of the available reference pictures that may be used 

25 - -for prediction. The two reference pictures are referred to as the List 0 and List 1 

predictors. An index for each reference picture is coded and transmitted, ref_idxJO 
and ref JdxJ1 , for the List 0 and List 1 reference pictures, respectively. 

The JVT standard provides two modes of weighted prediction, which allows 
weighting factors and/or offsets to be applied to reference pictures when forming a 

30 prediction. The weighting factor to be used is based on the reference picture index 
--(or indices in the case of birprediction) for the current macroblock or macroblock 
partition. The reference picture indices are either coded in the bitstream or may be 
derived, such as for skipped or direct mode macroblocks. A single weighting factor 
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and single offset are associated with each reference picture index for all of the slices 
of the current picture. In explicit nnode, these parameters are coded in the slice 
header. In implicit mode, these parameters are derived. The weighting factors and 
offset parameter values are constrained to allow for 16-bit arithmetic operations in the 

5 inter-prediction process. The encoder may select either implicit mode or explicit 
mode for each coded picture. 

JVT bi-predictive or "B" pictures allow adaptive weighting between the two 
predictions, i.e., Pred = [(PO) * (PredO)] + [(P1) * (Predl)] + D, where PO and PI are 
weighting factors, PredO and Predl are the reference picture predictions for List 0 

10 and List 1 respectively, and D is an offset. 

As shown in Figure 1, a standard video encoder is indicated generally by the 
reference numeral 100. An input to the encoder 100 is connected in signal 
communication with a non-inverting input of a summing junction 110. The output of 
the summing junction 1 1 0 is connected in signal communication with a block 

15 transform function 120. The transform 120 is connected in signal communication with 
a quantizer 130. The output of the quantizer 130 is connected in signal 
communication with a variable length coder ("VLC") 140, where the output of the VLC 
1 40 is an externally available output of the encoder 1 00. 

The output of the quantizer 130 is further connected in signal communication 

20 with an inverse quantizer 1 50. The inverse quantizer 1 50 is connected in signal 

communication with an inverse block transformer 160, which, in tum, is connected in 
signal communication with a reference picture store 170. A first output of the 
reference picture store 170 is connected in signal communication with a first input of 
a motion estimator 180. The input to the encoder 1 00 is further connected in signal 

25 -communication with a second input of the motion estimator 180. The output of the 
motion estimator 180 is connected in signal communication with a first input of a 
motion compensator 190. A second output of the reference picture store 170 is 
connected in signal communication with a second input of the motion compensator 
190. The output of the motion compensator 190 is connected in signal 

30 communication with an Inverting input of the summing junction 110. 

Turning to Figure 2, a video encoder with implicit reference picture weighting is 
indicated generally by the reference numeral 200. An input to the encoder 200 is 
connected in signal communication with a non-inverting input of a summing junction 
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210. The output of the summing junction 210 is connected in signal communication 
with a block transformer 220. The transformer 220 is connected in signal 
communication with a quantizer 230. The output of the quantizer 230 is connected in 
signal communication with a VLC 240, where the output of the VLC 240 is an 
5 externally available output of the encoder 200. 

The output of the quantizer 230 is further connected in signal communication 
with an inverse quantizer 250. The Inverse quantizer 250 is connected in signal 
communication with ah inverse block transformer 260, which, in turn, is connected in 
signal communication with a reference picture store 270. A first output of the 

10 reference picture store 270 is connected in signal communication with a first input of 
a reference picture weighting factor assignor 272. The input to the encoder 200 is 
further connected in signal communication with a second input of the reference 
picture weighting factor assignor 272. A second output of the reference picture store 
270 is connected in signal communication with a second input of the motion estimator 

15 280. 

The input to the encoder 200 is further connected in signal communication with 
a third input of the motion estimator 280. The output of the motion estimator 280, 
which is indicative of motion vectors,' is connected in signal communication with a first 
input of a motion compensator 290. A third output of the reference picture store 270 

20 is connected in signal communication with a second input of the motion compensator 
290. The output of the motion compensator 290, which is indicative of a motion 
compensated reference picture, is connected in signal communication with a first 
input of a multiplier or reference picture weighting applicator 292, Although an 
exemplary multiplier embodiment is shown, the reference picture weighting applicator 

25 -292-may be implementedln alternate ways," such as, for example, by a shift register. 
The output of the reference picture weighting factor assignor 272, which is indicative 
of a weighting factor, is connected in signal communication with a second input of the 
reference picture weighting applicator 292. The output of the reference picture 
weighting applicator 292 is connected in signal communication with an inverting input 

30 of the summing junction 210. 

Turning to Figure 3, a video encoder with explicit reference picture weighting is 
indicated generally by the reference numeral 300. An input to the encoder 300 is 
connected in signal communication with a non-inverting input of a summing junction 
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310. The output of the summing junction 310 is connected in signal communication 
with a block transformer 320. The transfomier 320 is connected in signal 
communication with a quantizer 330. The output of the quantizer 330 is connected in 
signal communication with a VLC 340, where the output of the VLC 340 is an 
5 externally available output of the encoder 300. 

The output of the quantizer 330 is further connected in signal communication 
with an inverse quantizer 350. The inverse quantizer 350 is connected in signal 
communication with an inverse block transfomier 360, which, in turn, is connected in 
signal communication with a reference picture store 370. A first output of the 
10 reference picture store 370 is connected in signal communication with a first input of 
a reference picture weighting factor assignor 372. The input to the encoder 300 is 
further connected in signal communication with a second input of the reference 
picture weighting factor assignor 372. A first output of the reference picture weighting 
factor assignor 372. which is indicative of a weighting factor, is connected in signal 
15 communication with a first input of a motion estimator 380. A second output of the 
reference picture store 370 is connected in signal communication with a second input 

of the motion estimator 380. 

The input to the encoder 300 is further connected in signal communication with 
a third input of the motion estimator 380. The output of the motion. estimator 380. 
20 which is indicative of motion vectors, is connected in signal communication with a first 
input of a motion compensator 390. A third output of the reference picture store 370 
is connected in signal communication with a second input of the motion compensator 
390. The output of the motion compensator 390, which is indicative of a motion 
compensated reference picture, is connected in signal communication with a first 
25 -input-of-a multiplier or reference picture weighting applicator 392. A second output of 
the reference picture weighting factor assignor 372. which is indicative of a weighting 
factor, is connected in signal communication with a second input of the reference 
picture weighting applicator 392. The output of the reference picture weighting 
applicator 392 is connected in signal comnr)unication with a first non-inverting input of 
30 a summing junction 394. A third output of the reference picture weighting factor 
assignor 372. which is indicative of an offset, is connected in signal communication 
with a second non-inverting input of the summing junction 394. The output of the 
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summing junction 394 is connected in signal communication with an inverting input of 
the summing junction 31 0. 

As shown in Figure 4, a video decoder for explicit reference picture weighting • 
is indicated generally by the reference numeral 500. The video decoder 500 includes 

5 a variable length decoder ("VLD") 510 connected in signal communication with an 
inverse quantizer 520. The inverse quantizer 520 is connected in signal 
communication with an inverse transformer 530. The inverse transformer 530 Is 
connected in signal communication with a first input terminal of a summing junction 
540, where the output of the summing junction 540 provides the output of the video 

10 decoder 500. The output of the summing junction 540 is connected in signal 

communication with a reference picture store 550, The reference picture store 550 is 
connected in signal communication with a motion compensator 560, which is 
connected in signal communication with a first input of a multiplier or reference 
picture weighting applicator 570. As will be recognized by those of ordinary skill in 

15 the pertinent art, the decoder 500 for explicit weighted prediction may also be used 
for implicit weighted prediction. 

The VLD 510 is further connected in signal communication with a reference 
picture weighting factor lookup 580 for providing a coefficient index to the lookup 580. 
A first output of the lookup 580 is for providing a weighting factor, and is connected in 

20 signal communication to a second input of the reference picture weighting applicator 
570. The output of the reference picture weighting applicator 570 is connected in 
signal comrnunication to a first input of a summing junction 590, A second output of 
the lookup 580 is for providing an offset, and is connected in signal communication to 
a second input of the summing junction 590. The output of the summing junction 590 

25 -is connected- in signal communication with a second input terminal of the summing 
junction 540. 

As shown in Figure 5, a picture cross-fade is indicated generally by the 
reference numeral 600. The exemplary picture cross-fade 600 includes a fade-out or 
starting picture 610, identified as FPO,, and a fade-in or ending picture 612, identified .. 
30 asFPI. 

Turning now to Figure 6, an exemplary process for encoding video signal data 
for an image block is indicated generally by the reference numeral 700. The process 
700 is implemented with an encoder, such as the encoder 200 or 300 of Figures 2 
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and 3, respective^. The process 700 includes a star. biod. 710 mat passes cor^. 
,o a decision block 712. The decision block 712 detem-ines whether a cross-fade .s 
present, and. « none is present, passes control to a,uncf»n block713. T^e.unct,on 

block 713 perfomts noni.al encoding and passes control to an end block 724 

However, if the decision block 712 finds a cross-fade, it passes control to a 

funcUon block 714. The function block 714 finds the fade-ou. starting point FPO. and 
passes control to a function block 716. which finds the fade-in ending pent FP1 
block 716 passes control to a function block 718. which codes the fade-out start 
picu,. FPO and passes control to a function block 720. The block 720 codes the 
fade-m end pteture FP1 and passes control to a function block 722. 

The function block 722. m turn, codes pictures disposed in display order 
between FPO and FP1 . using weighted predk*ion with the picture FPO as the list 0 
rHe and «,e picture FP1 as. he 11^ 1 reference. The function block 722 passes 



control to the end block 724 

15 



control \0 xne ei iw wiwrw . 

An authoring tool used for vkJeo cross-fades between a parr of pictures 
includes a video encoder, such as .he encoder 200 of Figure 2. and operates on pre^ 
stored Video con,ent. In addmon .0 me uncontpressed video con.en.. sonte addrt,onal 
infonnation may be available such as decision lists and ed^ng spUoe P°'"; 
video encoder in an authoring too, does no. necessarl^ heed .0 operate .n r^l «me. 
,0 special effects such asfades and cross-fades can be applied in*,e a"«io-g'o^^ 
Various .echniques are well known for detecting fades and cross^ades. also 
known as dissolves, in video sequences. When encoding a particular p^. for 
each macroblock or macroblock partition, a JVT encoder must select a co*ng 
decision mode, one or two reference pictures, and one or more 
25 -wherra-JVT-encoder us-e^eightsd-prediction. once per picture or slice * may also 
a welgming factor to be applied for each reference index used. One or more 
1 ence Indices refer fo each allowable reference plcure, so multiple weights can 

be used for each individual reference picture. 

lie au.hortng. CO, de.ecs When a cpss^fade is taking place. Theau honng 

30 .ool has sutficien. informafion .o detect when a cross-fade is taking place e. er 
.because i. applied .hecross-fade HseH, or because K read it from a ^e-ion ,s.. o 
t^cause . employs a fade detection algorithm. For a cross-fade, a picture iden . 
as the fade-out starting poW is kien«ied as FPO and me fade-in ending pom. picture 
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is identified as FP1. When a cross-fade is detected, the encoder codes pictures FPO 
and FP1 prior to coding the pictures between FPO and FP1 in display order, which 
are referred to as the cross-fade pictures. Thus, a feature of the present Invention Is • 
that the fade-in end picture, FP1 , is coded before the intermediate pictures. 
5 It is common In video encoders to use a fixed pattern of the I, P and B picture 

coding types, and for the coding order to differ from the display order. For example, 
such a common pattem might comprise: 

Common Coding Order: 10 P3 B1 82 P6 84 85 P9 87 88 
10 Common Display Order: 10 81 82 P3 84 85 P6 87 88 P9 

For this common pattem, picture P3 is coded before the intermediate 81 and 
82 pictures. The 81 and 82 pictures use 10 and PS as reference picture In Its 
prediction process. 

15 The JVT standard does not require the use of fixed picture coding type 

patterns, and does not suggest methods by which an encoder can adjust the patterns 
to maximize coding efficiency. In accordance with the current invention, coding 
efficiency of cross-fading sequences can be improved by adjusting picture coding 
type and coding order. If, for example, picture 0 and picture 9 were Identified as the 

20 fade-in start and fade-out end pictures, respectively, the following coding and display 
order could be used: 

Inventive Coding Order: 10 P9 81 82 83 84 85 86 87 88 
Inventive Display Order: 10 81 82 83 84 85 86 87 88 P9 

25 " 

When a cross-fade picture is encoded, the encoder orders the reference 
picture lists, using reference picture selection reordering if necessary, such that FPO 
is the first picture on List 0 and FP1 is the first picture on List 1 . This provides 
additional coding efficiency, because the reference index of 0, which. refers to the first . 

30 picture in the reference picture list, can be coded using a lesser number of bits than 
other reference indices. Then-a weighting-factor Is selected for the reference indices 
corresponding to each of FPO and FP1 , based on the relative contribution of the first 
picture and the second picture in the composition of the current picture. If the formula 
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used ,n c,«a*g ^ c.ss-,ade picture is .nown, either because 
created the cross-fade, or .rem side-intormation. then the weighing 'aCcr .^.n the 
composition formula can be used. ,. the exact formula is no, known, a we,gh.^g 
factor can be computed using any of several dWeren, algorUhms, such as ^ 
3 Led on ..ative distance of the current picture from FPO and FP1 . for e.amp e. 

This above described algorithm can be applied for all coded p,Cures ,n the 
cross-fade region, or may be applied only for those pictures that are mari<ed to be 
stored as reference piCutes. In alternate embodiments, either ,mpl,crt mode or 
llioi. mode weighted prediction may be used to code the cross-fade pictures. When 
10 mode is used, any weighting factors may be used. When imp« mode .s 

uleXelghting factors depend on*e relative distance of the cun.n. pi.u. from 

"™rsystem and .echni<,ue may be applied to eHher PredK*. T" p..u^. 
Which are encoded with a single predictor, or to Bi-pred^K-e "B" piCu«. wh^are 
,5 encoded w«h two predictor.. The decoding processes, which are present .n both 
encoder and decoders, are described belowforthe P and B picture cas^^ 
A«ema.ively, this technique may also be applied to coding systems us,ng the 
concepts similar to I, B. and P pictures. ^•,^„„ i„ B 

The same weighting factors can be used for s»,gle directK>nal P-d"*- 
^ plcturel and for bi-directional prediction in B pictures. When a single pr^^^or ,s used 
™crob,ock in P pictures or.or single directional prediction in B pictures, a 
s:gle n picture index is fransmUted for the block. After .he decod.g p^ss 
Tp o mo.-K,n clpensation produces a predictor, the weighting factor ,s appl,«l » 
t dilr The weighted predictor is then added to the coded tesidual, and o„pp,ng « 
« -C-<'-«.esum,tofom,*edecodedpic.ure. ^^'--^-^-^"^^ 
dcks in B pwures that use only Us. 0 prediction, the we,gh.ed predrctor ,s 

formed as: 

Pred=WO*PredO + DO 

where WO is me weigMing . actor associafed with the List 0 reference picture, 
O0isrilesoc..edw«h.heUs.0referencepictu..andPred0is.hemo.,on- 

compensa.ed predtaUon block from *e Lis. 0 reference picture. 
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For use for blocks in B pictures that use only List 1 prediction, the weighted 
predictor is formed as: 

Pred = W1 * Pred1 + D1 (2) 

5 

where W1 is the weighting factor associated with the List 1 reference picture, 
D1 is the offset associated with the List 1 reference picture, and Pred1 is the motion- 
compensated prediction block from the List 1 reference picture. 

The weighted predictors may be clipped to guarantee that the resulting values 
10 will be within the allowable range of pixel values, typically 0 to 255. The precision of 
the multiplication in the weighting formulas may be limited to any pre-determlned 
number of bits of resolution. 

In the bi-predictive case, reference picture indexes are transmitted for each of 
the two predictors. Motion compensation is performed to form the two predictors. 
15 Each predictor uses the weighting factor associated with its reference picture index to 
form two weighted predictors. The two weighted predictors are then averaged 
together to form an averaged predictor, which is then added to the coded residual.. 

For use for blocks in B pictures that use List 0 and List 1 predictions, the 
weighted predictor is formed as: 

20 

Pred = (PO * PredO + DO + PI * Pred1 + D1 )/2 (3) 

Clipping may be applied to the weighted predictor or any of the intermediate 
values in the calculation of the weighted predictor to guarantee that the resulting 

25 values will be within the allowable range of pixel values, typically 0 to 255. 

Thus, a weighting factor is applied to the reference picture prediction of a 
video compression encoder and decoder that uses multiple reference pictures. The 
weighting factor adapts for individual motion blocks within a picture, based on the 
reference picture index that is used for that motion block. Because the reference . . 

30 picture index is already transmitted in the compressed video bitstream, the additional 
overhead to adapt the weighting-factor- on a motion block basis is dramatically 
reduced. All motion blocks that are coded with respect to the same reference picture 
apply the same weighting factor to the reference picture prediction. 
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in the Joint Model ("JW) software of the JVT committee, an a posteriori 
method using rate dlstomon optimization is used for selection of motion veoto^, 
maoroblock partitioning, prediction mode, and reference picture indrces. in 
method a «nge of allowable values for each of these choices is tested and a cost « 
1— d .or each choi<.. The choice that leads to the minimum cost is selected. 

Motion estimation techniques have been widely studied. For each mouon 
block of a picture being coded, a motion vector is chosen that represents a . 

di^acem n. o, the mo«on block from a .ference p«ure. In an exhaust.e sea ch 
n,ld wnhin a search region, every displacement within a pre-de.erm,ne ra e o, 
lets relative to «,e motion block position is tested. The test '"ciudes -^^^^^^^^^ 
the sum of the absolute difference ("SACn or mean squared error ( MSE of each 
I n the motion block in ^ current picture with the displaced motion btock ,n a 
: Ice Picture. TheoffsetwHhthelowestSAO or MSE isse.e.ed^m^^^^^ 
vector. Numerous variations on this technique have been p«>posed, such as mree 
IT search and rate-distortion optimized motion estimation, all of whtch include «,e 
Tp ordmputing the SAD or MSE o, the current motion biook With a displaced 

motion block in a reference picture. wo„^o 
computational costs for detem^ining motion vectors and adap«ve r^ei^nce 
picture weighting factors can be reduced by us„g an iterative process, v«e sttli 
, selecting motion vecto,^ and weighting factors that are able to achieve high 

sL eificiencies. An exemplar embodiment motion vector and we,gh.ng 
"d"llna.i on process is described assuming that a singte weighting .actor ,s 
IZ^ Z.^,^ reference picture, aHhough the principles of the invention shouM 
rre construed as being so limited. The process could also be appi,«. ov.r smaller 
5 -..gions-o. the picture, su*-as slices, for example. In addHion, although one 
Txempiary emLmen. Is described as using only a single reference plCu^. *e 
Zlprmay also be applied to multiple reference picture prediction and to bi- 

''"TairlXo.theino.ionvector.oramotionbiockcantypicallyb^^^^ 

^ rr:o:r!==z^^^^^^^^ 

plre pixel values. The weighting .actor may be limited to a number o. b,ts of 
picture pixel veiluc thpre is no need to consider the 

resolution. If the weighting factor .s very close to 1 . there .s 
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weighting factor in the motion estimation process, and normal motion estimation can 
be done with the weighting factor assumed to be equal to 1. Otherwise, the 
weighting factor estimate is applied to the reference picture. Motion estimation is 
then performed using any method which calculates SAD or MSE, but with the SAD or 
5 MSE calculation performed between the current picture motion block and the 

displaced motion block in the weighted version of the reference picture, rather than 
the un-weighted reference picture. The estimation of the weighting factor can be 
refined after the motion vectors have been selected, if necessary. 

The current motion vectors are applied to the weighted reference picture to 

10 form the weighted, motion compensated reference picture. A difference measure 

between the weighted, motion compensated reference picture and the current picture 
is computed. If the difference measure is lower than a threshold, or lower than the 
previous best difference measure, the process is complete, and the current candidate 
motion vectors and weighting factor are accepted. 

15 If the difference measure is higher than some threshold, the weighting factor 

can be refined. In this case, a motion compensated but un-weighted reference 
picture is formed based on the current candidate motion vectors. The weighting 
factor estimate is refined using the motion compensated reference picture and the 
current picture, rather than using the un-compensated reference picture, as was done 

20 in fomning the initial estimate of the weighting factor- 
In one embodiment, the initial estimate of the weighting factor, w, is the ratio 
between the average value of the pixels in the current picture, cur, divided by the 
average value of the pixels in the reference picture, ret where: 

25 * w = avg(cur)7 avg(ref) ' ' (4) 

The refinement estimates are the ratio between the average of pixels in the 
current picture and the average of pixels in the motion compensated reference 
picture, mcref, where: ... ... 



30 



w = avg(cup) / avg(mcref) (5) 
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The difference measure diffis the absolute value of the average of pixel 
differences between the current picture, cur, and the weighted motion compensated 
reference picture, wmcref, where: 

diff = I S cur - wmcref I 

In another embodiment, the difference measure is the sum of the absolute 
differences of the pixels in the current picture and in the weighed motion 
compensated reference picture, where: 



diff = X I cur -wmcref 



(7) 



When block-based motion estimation is performed, the same pixel In a 
reference picture is used for numerous SAD calculations. In an exemplary 
embodiment during the motion estimation process, once a weighting factor has been 
applied to a pixel in a reference picture, the weighted pixel is stored, in addition to the 
normal pixel. The storage may be done erther for a region of the picture, or for the 
entire picture. 

The weighted reference picture values may be clipped to be stored with the 
same number of bits as an unweighted reference, such as 8 bits, for example, or may 
be stored using more bits. If clipping is performed for the motion compensation 
process, which is more memory efficient, the weighting factor is reapplied to the 
reference picture for the actual selected motion vector, the difference is calculated 
using additional bits, and the clipping is performed after the difference in order to 
-i76id mismatch with a decoder, which might othen«/lse occur if the decoder does not 
perfomi clipping after the weighting factor is applied. 

When multiple reference pictures are used to encode a picture, a separate 
weighting factor can be calculated for each reference picture. During motion 
estimation, a motion vector and a reference picture index are selected for each 
motion block. For each Heration of the process, motion vectors and weighting factors 
- are found for each reference picture. 

In a preferred embodiment, during motion estimation, the best reference 
picture for a given motion block is determined. Calculation of the difference measure 
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is done separately for each reference picture, with only those motion blocks that use 
that reference picture being used in the calculation. Refinement of the weighting 
factor estimate for a given reference picture also uses only those motion blocks that . 
are coded using that reference picture. For bi-predictive coding, weighting factors 

5 and motion vectors can be determined separately for each of the two predictions, 
which will be averaged together to form the averaged prediction. 

The principles of the present invention can be applied to many different types 
of motion estimation algorithms. When used with hierarchical approaches, the 
iteration of weighting factor selection and motion vector selection can be used with 

10 any level of the motion estimation hierarchy. For example, the iterative approach 
could be used with integer picture element ("pel") motion estimation. After the 
weighting factor and integer motion vectors are found using the provided iterative 
algorithm, the sub-pel motion vectors may be found without requiring another iteration 
of the weighting factor selection. 

15 These and other features and advantages of the present invention may be 

readily ascertained by one of ordinary skill in the pertinent art based on the teachings 
herein. It is to be understood that the principles of the present invention may be 
implemented in various forms of hardware, software, firmware, special purpose 
processors, or combinations thereof: 

20 Most preferably, the principles of the present invention are implemented as a 

combination of hardware and software. Moreover, the software is preferably 
implemented as an application program tangibly embodied on a program storage 
unit. The application program may be uploaded to, and executed by, a machine 
comprising any suitable architecture. Preferably, the machine is implemented on a 

25 computer platform having hardware such as one or more central processing units 
("CPU"), a random access memory ("RAM"), and input/output ("I/O") interfaces. The 
computer platform may also include an operating system and microinstruction code. 
The various processes and functions described herein may be either part of the 
microinstruction code or part of the application program, or any combination thereof, 

30 which may be executed by a CPU. In addition, various other peripheral units may be 
connected to the computer platform such as an additional data storage unit and a 
printing unit. 
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It Is to be further understood that, because some of the constituent system 
components and methods depicted in the accompanying drawings are preferably 
Implemented in software, the actual connections between the system components or 
the process function blocks may differ depending upon the manner in which the 
5 present invention Is programmed. Given the teachings herein, one of ordinary skill in 
the pertinent art will be able to contemplate these and similar Implementations or 
configurations of the present invention. 

Although the illustrative embodiments have been described herein with 
reference to the accompanying drawings, it Is to be understood that the present 
10 invention is not limited to those precise embodiments, and that various changes and 
modifications may be effected therein by one of ordinary skill in the pertinent art 
without departing from the scope or spirit of the present Invention. All such changes 
and modifications are intended to be included within the scope of the present 
invention as set forth in the appended claims. 

15 
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CLAIMS 

1 . A video encoder (200, 300) for encoding video signal data for at least 
one cross-fade picture disposed temporally between a fade-out start picture and a 
fade-in end picture, which are used as reference pictures for coding the at least one 

5 cross-fade picture, the encoder comprising: 

a reference picture weighting applicator (292, 392); and 
a reference picture weighting factor unit (272, 372) in signal communication 
with the reference picture weighting applicator for assigning weighting factors 
corresponding to each of the fade-out start picture and the fade-in end picture, 
10 respectively, for coding the at least one cross-fade picture. 

2. A video encoder as defined in Claim 1 , further comprising a motion 
compensation unit (290, 390) in signal communication with the reference picture 
weighting applicator for providing at least one of a motion compensated fade-out start 

15 picture and a motion compensated fade-in end picture responsive to the reference 
picture weighting factor unit for coding the at least one cross-fade picture. 

3. A video encoder as defined in Claim 2, further comprising a reference 
picture store (270, 370) in signal communication with each of the reference picture 

20 weighting factor unit and the motion compensation unit for storing each of the fade- 
out start picture and the fade-in end picture. 

4. A video encoder as defined in Claim 2 wherein the reference picture 
weighting applicator applies a weighting factor selected by the reference picture 

25 -weighting factor-QrirtToat least one of the motion compensated fade-out start picture 
and the motion compensated fade-in end picture. 

5. A video encoder as defined in Claim 4 usable with bi-predictive picture 
predictors, the encoder further comprising prediction means for forming first and 

30 second predictors from the weighted and motion compensated fade-out start and 
-fade-in end pictures,-respectively. 
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6. A video encoder as defined in Claim 5 wherein tlie weiglnted and motion 
compensated fade-out start and fade-in end pictures, respectively, are eacli from 
opposite directions relative to all of the at least one cross-fade pictures. 

7. A video encoder as defined in Claim 1 , further comprising a motion 
estimation unit (380) in signal communication with the reference picture weighting 
factor unit for providing motion estimation responsive to weighting factor in an explicit 
mode of operation. 

8. A video encoder as defined in Claim 2, further comprising a summing 
unit (394) in signal communication with the reference picture weighting factor unit for 
applying an offset to the weighted motion compensated reference picture in an 
explicit mode of operation. 

9. A method (700) for encoding cross-fades between pictures, the method 
comprising: 

identifying pictures for which a cross-fade is defined; 

detemiining (71 4,71 6) appropriate end-points from pictures for which said 

cross-fade is defined; and 

encoding (718,720) said end-points prior to encoding (722) at least one picture 

intermediate to said end-points. 

10. A method as defined in Claim 9 wherein said end-points from pictures 
for which said cross-fade is defined are used as reference pictures when encoding at 
least one picture intermediate to said end-points. 

11. A method as defined in Claim 9, further comprising: 

receiving a substantially uncompressed fade-out start picture; receiving a 
substantially uncompressed fade-in end picture; 

assigning a weighting factor for the at least one - picture corresponding to the 
fade-out start picture; and 

assigning a weighting factor for the at least one - picture corresponding to the 

fade-in end picture. 



I > 
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12. A method as defined in Claim 1 1 , further comprising: 

computing motion vectors corresponding to the difference between the at least 

one cross-fade picture and at least one of the fade-out start picture and the fade-in • 

end picture; 

5 motion compensating the at least one of the fade-out start picture and the 

fade-in end picture in correspondence with the motion vectors; 

multiplying the motion compensated at least one of the fade-put start picture 

and the fade-in end picture by the assigned weighting factor, respectively, to form at 

least one weighted motion compensated reference picture; and 
10 subtracting the at least one weighted motion compensated reference picture 

from the at least one cross-fade picture; and encoding a signal indicative of the 

difference between the at least one cross-fade picture and the at least one weighted 

motion compensated reference picture. 

15 13. A method as defined in Claim 12 wherein exactly two reference pictures 

are used, the exactly two reference pictures comprising the pre-coded fade-out start 
picture, FPO, and the fade-in end picture, FP1 . 

14. A method as defined in Claim 13, further comprising: 

20 combining the motion compensated fade-out start picture with the motion 

compensated fade-in end picture prior to subtracting from the at least one cross-fade 
picture. 

15. A method as defined in Claim 12 wherein computing motion vectors 
25 comprises: 

testing within a search region for every displacement within a pre-determined 
range of offsets relative to the at least one cross-fade picture; 

calculating at least one of the sum of the absolute difference and the mean 
squared error of each pixel in the at least one cross-fade. picture with a motion 
30 compensated reference picture; and 

selecting the offset with the lowest sum of the absolute difference and mean 
squared error as the motion vector. 
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1 6. A method as defined in Claim 1 2 wherein computing motion vectors 
comprises: 

testing within a search region for every displacement within a pre-detemiined 
range of offsets relative to the at least one cross-fade picture; 

calculating at least one of the sum of the absolute difference and the mean 
squared error of each pixel in the at least one cross-fade picture with a first motion 
compensated reference picture corresponding to the fade-out start picture; 

selecting an offset with the lowest sum of the absolute difference and mean 
squared error as the motion vector for the fade-out start picture; 

calculating at least one of the sum of the absolute difference and the mean 
squared error of each pixel in the image block with a second motion compensated 
reference picture corresponding to the fade-in end picture; and 

selecting an offset with the lowest sum of the absolute difference and mean 
squared error as the motion vector for the fade-in end picture. 

1 7. A method as defined In Claim 1 1 wherein the weighting factors for the 
fade-out start picture and the fade-in end picture, respectively, are each responsive to 
the relative distance between the at least one cross-fade picture and the fade-out 
start picture or the fade-in end picture, respectively, in an implicit mode of operation. 

18. A video CODEC comprising an encoder as defined in Claim 1 and a 
decoder (500) for decoding video signal data for a cross-fade picture relative to each 
of a fade-out start picture and a fade-in end picture to predict the cross-fade picture, 
the decoder comprising a reference picture weighting factor unit (580) having an 
output for determining weighting factors corresponding to each of the fade-out start 
picture and the fade-in end picture. 

1 9. A video CODEC as defined in Claim 1 8 wherein the reference picture 
weighting factor unit has a second output for determining offsets corresponding to 
each of the fade-out start picture and the fade-in end picture. 
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20. A video CODEC as defined in Claim 1 8, further comprising a variable 
length decoder (510) in signal communication with the reference picture weighting 
factor unit for providing indices corresponding to each of the fade-out start picture 
and the fade-in end picture to the reference picture weighting factor unit. 

5 

21 . A video CODEC as defined in Claim 1 8, further comprising a motion 
compensator (560) in signal communication with the reference picture weighting 
factor unit for providing motion compensated reference pictures responsive to the 
reference picture weighting factor unit. 

10 

22. A video CODEC as defined in Claim 21 , further comprising a reference 
picture weighting applicator (570) in signal communication with the motion 
compensator and the reference picture weighting factor unit for applying a weighting 
factor to each motion compensated reference picture. 

15 

23. A video CODEC as defined in Claim 21 , further comprising an adder 
(590) in signal communication with the motion compensator and the reference picture 
weighting factor unit for applying an offset to each motion compensated reference 
picture. 

20 

24. A video CODEC as defined in Claim 18 wherein the video signal data is 
streaming video signal data comprising block transform coefficients. 

25. A video CODEC as defined in Claim 18 usable with bi-predictive picture 
25 predictors, the decoder further compi^ising: 

prediction means for forming first and second predictors from two different 
reference pictures; 

averaging means for averaging the first and second predictors together using 
their corresponding weighting factors to form a single averaged predictor. 

30 
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